A STRATEGY FOR NON-STRICTLY CONVEX TRANSPORT COSTS AND 

THE EXAMPLE OF \\x - y\\ p IN M 2 
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Abstract. This paper deals with the existence of optimal transport maps for some optimal 
transport problems with a convex but non strictly convex cost. We give a decomposition 
strategy to address this issue. As part of our strategy, we have to treat some transport 
problems, of independent interest, with a convex constraint on the displacement. As an 
illustration of our strategy, we prove existence of optimal transport maps in the case where 
the source measure is absolutely continuous with respect to the Lebesgue measure and the 
transportation cost is of the form h(\\x — y\\) with h strictly convex increasing and ||.|| an 
arbitrary norm in K 2 . 



1. Introduction 

Given two probability measures /i and v on R d and a transport cost c : R d — > K the corre- 
sponding Monge problem consists in minimizing the average transport cost L d c(x — T(x))fx(dx) 
among all transport maps T i.e. maps pushing forward /i to v (which as usual is denoted by 
T#fJ. = v). It is a highly nonconvex problem (whose admissible set may even be empty if \x has 
atoms for instance) and it is therefore relaxed to the Monge-Kantorovich problem that consists 
in minimizing J RdxRd c{x — y)^/(dx, dy) over II(/x, v), the set of transport plans i.e. of probability 
measures on K d x M. d having /i and v as marginals. To prove existence of an optimal transport 
map one thus aims to prove that there is an optimal plan 7 (existence of such plans holds under 
very mild assumptions since the Monge-Kantorovich problem is linear) that is in fact induced by 
a transport map i.e. of the form 7 = (id,T)#/z. To achieve this goal, one usually uses strongly 
the dual problem that consists in maximizing J Rd <pd/i + J Rd ipdv subject to the constraint that 
4>{x) + ip(y) < c(x — y). An optimal pair (4>,ip) for the dual is called a pair of Kantorovich 
potentials. It is very well-known (under reasonable assumptions on the measures) that when c is 
strictly convex then this strategy gives an optimal transport (see [12] or section [2] below where 
the arguments is briefly recalled; we also refer to the book [14] for a general overview and recent 
developments of optimal transport theory). 

It is also well-known that lack of strict convexity makes the existence of an optimal transport 
much more delicate. Even the important case (originally considered by Monge) where c is a norm 
was well understood only in recent years ([TT], [3], [3J, [2J, [5], [5J, [S], [Z])- In the case of the 
Euclidean norm, for example, the direction of the displacement is determined by a Kantorovich 
potential and transport only takes place on a set of segments called transport rays. On the one 
hand, the lack of strict convexity in the radial direction gives rise to an indeterminacy of the 
displacement length, but on the other hand, the problem on transport rays is one dimensional and 
then much simpler. These observations lead to a strategy proof originally due to Sudakov f |13|) 
consisting in reducing to a one-dimensional problem on each transport ray (monotone transport 
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for instance) and then glue the solutions together. The most involved part of the full proof 
consisted in proving that the transport rays have enough regularity so that the disintegrated 
measures on such rays are non-atomic (see [3], for such a proof). 

In the present paper we consider a convex but not strictly convex c. We propose a decom- 
position strategy that takes advantage of the fact that whenever the displacement is not fully 
determined it means that it lies in some face of c but on such a face the cost is affine and is there- 
fore unchanged if we replace the transport plan by another one which has the same marginals 
and satisfy the further constraint that the displacement belongs to the face. In the spirit, our 
strategy can be compared to Sudakov's one but it is different since there is no real analogue of 
transport rays here. Instead, our strategy, detailed in section [2l involves "face restricted prob- 
lems" that are optimal transport problems with convex constraints on the displacement. Such 
constrained problems have, we believe, their own interest and motivation (e.g. due to connec- 
tions with L°° transportation problems as studied in |10j ) and we will address some of them in 
section [3] We will avoid here subtle disintegration arguments to glue together the face restricted 
problems in general but will instead illustrate in section [4] how our strategy easily produces an 
optimal transport map in the case of c{z) = h(\\z\\) with h strictly convex increasing and || • || 
an arbitrary norm in R 2 . The contributions of this paper are then : 

• a general decomposition strategy to deal with convex but non strictly convex costs, 

• a contribution to constrained transport problems, 

• the proof of existence of optimal transport maps for a class of costs in M 2 . 



2. Strategy of decomposition 

In this section we outline a general decomposition strategy to study existence of an optimal 
transport for the Monge-Kantorovich problem 

c(x -y)j(dx,dy) : 7 € U(fi, v) 

where c is convex but not strictly convex. Not all of the steps listed below can be always carried 
on in full generality. In the following sections we will detail some case where this is possible and 
illustrate some applications. 

The decomposition is based on the following steps: 

• Consider an optimal plan 7 and look at the optimality conditions by means of a solution 
(<fi, ifi) of the dual problem. From the fact that 

4>{x) + ip(y) = c{x — y) on spt7 and 4>{x) + ip(y) < c{x — y) 

one deduces that if x is a differentiability point for (f> (which is denoted x £ Diff (</>)), 

V0(x) £ dc(x — y), 



which is equivalent to 
Let us define 



x-y £ dc*(Vcf>(x)). (2.1) 



T c ~{dc*{p) : P £R d }, 

which is the set of all values of the subdifferential multi-map of c*. These values are 
those convex sets where the function c is affine, and they will be called faces of c. 

Thanks to (|2.1|) . for every fixed x, all the points y such that (x,y) belongs to the 
support of an optimal transport plan are such that the difference x — y belong to a same 
face of c. Classically, when these faces are singleton (i.e. when c* is differentiable, which 
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is the same as c being strictly convex), this is the way to obtain a transport map, since 
only one y is admitted for every x. 

Equation (|2.1[) also enables one to classify the points x as follows. For every K G T c 
we define the set 



Hence 7 may be decomposed into several subplans "/k according to the criterion x G Xk , 
which, as we said, is equivalent to x — y G K. 

If the set J- c is finite or countable, we can define 



which is the simpler case. Actually, in this case, the marginal [Lk '■— (7i"i)#7if (where 
tti(x, y) — x) is a submeasure of /i, and in particular it inherits the absolute continuity 
from fj,. This is often useful for proving existence of transport maps. 

If T c is uncountable, in some cases one can still rely on a countable decompositions by 
considering the set ^ nu ' a := \K G T c : K \s not a singleton }. If J : ™ ultl is countable, 
then one can separate those x such that <9c*(V(0(x)) is a singleton (where a transport 
already exists) and look at a decomposition for K G jT™"'* 1 only. 

In some other cases, it could be useful to bundle together different possible K's so 
that the decomposition is countable, even if coarser. We will give an example of this last 
type in Section 4. 

• This reduces the transport problem to a superposition of transport problems of the type 



The advantage is that the cost c restricted to K is easier to study. For instance, if K is 
a face of c, then c is affine on K and in this case the transport cost does not depend any 
more on the transport plan. 
• If K is a face of c the problem is reduced to find a transport map from fiK to vk 
satisfying the constraint x — T(x) G K, knowing a priori that a transport plan satisfying 
the same constraint exists. 

In some cases (for example if if is a convex compact set with non-empty interior) this 
problem may be reduced to an L°° transport problem. In fact if one denotes by || • \\k 
the (gauge-like) "norm" such that K — {x : \\x\\k < 1}, one has 



and the question is whether the same result would be true if one restricted the admissible 
set to transport maps only (passing from Kantorovich to Monge, say). The answer would 
be positive if a solution of (|2.2[) was induced by a transport map T (which is true if 
Hk <C C d and K = B(0, 1), see [TU], but is not known in general). Moreover, the answer 
is also positive in R where the monotone transport solves all the LP problems, and hence 
the L°° as well. 

• A positive answer may be also given in case (and it is actually almost equivalent) one 
is able to select, for instance by a secondary minimization, a particular transport plan 
satisfying spt(7) C {x — y G K} which is induced by a map. This leads to the very 
natural question of solving 



Xk '■= {x G DiS((f>) : <9c*(V0(x)) = K}. 



IK ■= l\x K xid 





(2.2) 
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or, more generally, transport problems where the cost function involves convex con- 
straints on x — y. These problems are studied in Section 3. For instance, in the 
quadratic case above, we can say that the optimal transport T exists and is given by 
T(x) = x — proj^-(V0(x)) (projjf denoting projection on K), provided two facts hold: 

— fi is absolutely continuous, so that we can assure the existence of a (possibly ap- 
proximate) gradient of <f> ji— a.e. under mild regularity assumptions on (f>; 

— an optimal potential <f> does actually exist, in a class of functions (Lipschitz, BV) 
which are differentiable almost everywhere, at least in a weak sense. 

• In order to apply the study of convex-constrained problems to the original problem 
with c(x — y) the first issue (i.e. absolute continuity) does not pose any problem if the 
decomposition is finite or countable, while it is non trivial, and it presents the same kind 
of difficulties as in Sudakov's solution of the Monge's problem, in case of disintegration. 
For interesting papers related to this kind of problems, see for instance 0[6]. 

• As far as the second issue is concerned, this is much more delicate, since in general 
there are no existence results for potentials with non-finite costs. In particular, a coun- 
terexample has been provided by Caravenna when c(x, y) = \x — y\ + Xk{% — y) where 
K = M. + x M+ C M 2 and it is easily adapted to the case of quadratic costs with convex 
constraints. On the other hand, it is easy to think that the correct space to set the 
dual problem in Kantorovitch theory for this kind of costs would be BV since the the 
constraints on x — y enable one to control the increments of the potentials <j) and ip on 
some directions, thus giving some sort of monotonicity. Yet, this is not sufficient to find 
a bound in BV if an L°° estimate is not available as well and the counterexample that 
we mentioned - which gives infinite values for both <fi and tp, exactly proves that this 
kind of estimates are hard to prove. 

Remark 2.1. An interesting example that could be approached by this strategy is that of crys- 
talline norms (a problem that has been already solved by a different method in [2]). In this 
case the faces of the cost c are polyhedral cones but, if the support of the two measures are 
bounded, we can suppose that they are compact convex polyhedra. This means, thanks to the 
considerations we did above, that it is possible to perform a finite decomposition and to reduce 
the problem to some L°° minimizations for norms whose unit balls are polyhedra. In particular 
solving the L°° problem for crystalline norms is enough to solve the usual L 1 optimal transport 
problem for the crystalline norms. 

Remark 2.2. If, on the other hand, one wants a simple example that can be completely solved 
through this strategy and that works in any dimension, one can look at the cost c(z) = (\z\ — 1)3_, 
which vanishes for displacements smaller than one. In this case it is easy to see that has 
only one clement, which is given by the Euclidean ball B(0, 1). The associated L°° problem has 
been solved in [TDJ and this is enough to get the existence of an optimal transport. 

3. Constrained transport problems 

In this section we see two useful examples of transport problem under the constraint x— y G K. 
Since the cost we use, due to this constraint, is lower semicontinuous but not finitely valued, it 
is well-known that duality holds (the minimum of the Kantorovich problem coincides with the 
supremum of the dual one), but existence of optimizers in the dual problem is not guaranteed. 
As we underlined in Section 2, this is a key point and what we present here will always assume 
(artificially) that this existence holds true. In section 4 we will show a relevant example in which 
this is actually the case. 
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3.1. Strictly convex costs with convex constraints. We start from the easiest transport 
problem with convex constraints: 

Theorem 3.1. Let fi, v be two probability measures on M. d , with fi <C C d , K a closed and 
convex subset of Mr and h : M — > R a strictly convex function. Let c(z) = h(z) + Xk( z ) : then 
the transport problem 

min I I c(x- y) j(dx, dy), 7 G II(/i, v) \ 

admits a unique solution, which is induced by a transport map T, provided that a transport 
plan with finite cost exists and the dual problem admits a solution (0, ijj) where <fi is at least 
approximately differentiable a.e. 

Proof. As usual, consider an optimal transport plan 7 and the potentials <f> and ip optimality 
conditions on optimal transport plans and optimal potentials for this problem read as 

4>{x) + ip(y) — h(x — y) on spt7 C {x — y G K} 

and 

4>(x) + ip(y) < h(x — y) for all (x,y) : x — y G K 

and, if <f> is differentiable at x, they lead to 

V<j>[x) G dc{x -y) = dh(x -y)+ Af K {x - y), (3.1) 

where Nk(z) is the normal cone to K at z. We used the fact that the sub-differential of the 
sum h and xk is the sum of their sub-differentials since h is real- valued and hence continuous. 
Equation (|3.ip is verified by the true gradient of <j> if it exists but it stays true for the approximate 
gradient if <fi is only approximately differentiable. 
Yet, when a vector I and a point z G K satisfy 

I G dh(z) + N K (z), 

thanks to the convexity of h and K , this gives that z minimizes K 3 z 1— » h(z) — I ■ z. Since h is 
strictly convex this gives the uniqueness of z, which will depend on I. We will denote it by z(l). 

In this case, we get x — y — z(S7(f>(x)), which is enough to identify y = T(x) := x — z(V<j){x)) 
and proving existence of a transport map which is necessarily unique. □ 

Remark 3.2. Notice that, in the case h(z) = ^\z\ 2 , the point z(l) will be the projection of / on 
K, which gives the nice formula 

T(x) = x- proj^(V0(a;)), 
i.e. a generalization of the usual formula for the optimal transport in the quadratic case. 

3.2. Strictly convex costs of one variable and convex constraints. Let [i <C C 2 and v be 

o 

two probability measures in R 2 , let K be a convex subset of R 2 such that K ^ and consider 
the cost 

c(x -y) = h(xi - y x ) + xk(x - y), 
for a function h : R — > R increasing and strictly convex. 

Theorem 3.3. Assume that there exist 7 G n(/i, v) and <f>, tp Lipschitz such that 

4>(x) + ip{y) < c(x - y) Vx,y (3.2) 

<P(x) + i>(y) = c(x - y) 7 -a.e. (3.3) 
Then there exists an optimal transport map for the cost c between /i and v 
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Remark 3.4. The assumption above implies that 7 is an optimal plan and the couple (4>,ip) is 
optimal for the dual problem. In order to have the existence of an optimal 7 it is enough to 
assume that there exists at least one a G n(/i, v) such that 

/ c(x — y)cr(dx, dy) < 00. 

il 2 xt 2 

Again we recall that this assumption is not enough to have the existence of an optimal couple 
for the dual problem. 

Proof. As jC 2 (R 2 \ Diff (</>)) = also /i(K 2 \ Diff (0)) = then we can restrict our attention to the 
points of differentiability of tp. For each (x,y) G spt(j) such that x G Diff(0) by (|3.2j) and (|3.3j) 
we obtain 

V<f>(x) &h'{x 1 -y l )e 1 +Af K (x-y). (3.4) 
By the convexity of the functions involved (|3.4p is equivalent to 

x — y 6 arg min{/i(zi) — V<j){x) ■ z + Xk(z)}- 

z 

Let S be the set of those x such that the set on the right-hand side of (|3.4|) is a singleton 

argmm{h(zi) - Vcf)(x) • z + Xk{z)} = {p}, 

z 

then y is uniquely determined by 

y = x-p. 

Let us consider a decomposition of 7 in two parts: 7 = 7|s x r 2 + 7|S c xR 2 The first part of the 
decomposition is already supported on the graph of a Borel map. As far as the second part is 
concerned, we will prove the existence of a transport map which gives the same cost and the 
same marginals. This will be done by a sort of one-dimensional decomposition according to the 
following observations. 

Whenever the set on the right-hand side of (|3.4p is not a singleton then by the convexity of 
the function h(z{) — V<j)(x) - z it is a convex subset of K. Even more, by the strict convexity of 
h there exist a number m(V0(x)) such that 

arg mm{h(z) — Vcf>(x) ■ z + xk(z)} C {z ■ e\ — m(V(f>(x))} (^| K. 

We claim that if arg m\n z {h(z) — V(f>(x) ■ z + xk( z )} has more than one element then x is a local 
maximum of <p on the line x + te2- In fact assume that there exist two points y — (m(V <p(x)) , 2/2) 
and y — (m(V</>(a;)), 2/2) such that x — y G argmm. z {h(z) — V0(x) • z + xk{z)} and x — y G 
argmht z {h(z) — Vc/)(x) ■ z + xk(z)} and assume without loss of generality that j/2 < yi- As 
x — y G K also x + te^ — y G K for small and positive t (because it belongs to the segment 
[x - y,x- y}), then 

(j)(x + te 2 ) + ip(y) < h(x! - m(V(t>(x))) 
<f>(x) + tp(y) = hfa - m{V<j>(x))) 
and subtracting the second equation from the first 

cj)(x + te 2 ) - <f>(x) < 0. 

The same inequality is obtained for small, negative t using y. Introduce now the set M n = 
{x I <f){x+te 2 ) < <f>{x) V£ G [-i i]}. The set M n is closed. Let M l n := M n C\{x : x-e 2 G [i, *±±]}, 
as a consequence of local maximality, <j) is vertically constant on M l n . There exists a function 
<p depending only on the variable x\ such that <p coincide with <f> on M l n . Then on the set 
C(M^) n Diff ((f)), Vcf) — app\7cj) — appV<f) and the latter approximate gradient depends only on 
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X2 which implies that V</> is vertically constant on a subset of full measure of M* . The disjoint 
union S c = U n ,i[S c D {M % n \ M„_i)] induces the decomposition 

7|S-xR 2 = y^7n,i, 

where 7„,j := 7|[s ,c n(M; l \M„_i)]xR 2 - 

Denote by and v n ,i the marginals of 7^. Clearly \i n ^ is absolutely continuous with 
respect to C 2 . Consider the disintegration according to xi 

Hn,i = a„, 4 (xi) ■ C 1 <8>M^ 

with /i^ <C £ x for almost every x±, where a n ^(xi)-C 1 is the projection of ji n> i on the first variable 
(which is absolutely continuous as well). Analogously, we define v v ^ i as the disintegration of v n j 
with respect to y\. From what we said, it follows that there exist a map T n s : R — > M such that 
sptjnd H {x : x • ei = xi} C {(x,y) : y ■ e\ — T n ^{x\)}. Notice that T nji (xi) coincides with 
m(V0(x)) for every x G M l n whose first component is x\. 
Also disintegrate 

ln,i = a n ,i{xi) ■ C 1 

with 7^ G II( / u^ 1 i ,5 Tn (g) f n"i^ Xl ^)- It is then enough to replace 7 Hj i with a new transport 
plan of the form (id x S n ,i)#H>n,i with the following properties 

S n j(xi,x 2 ) = (T Il! i(xi),5^ i (x 2 )), 

( S n]i)#Vl)i = "nY {Xl \ x 2 - S*li(X2) G K Xl _ TnAxi) , 

where we denote, for every t, the section of K at level t by K t := {s : (t, s) G K}. 

Since the /i-part of the cost only depends on x\ — y\ and and the transport plan issued 
by the map S n j realize the same displacements as far as the first coordinates are concerned, this 
part of the cost does increase. Moreover, the constraint (xi — y\, x<i — t/2) G K , which amounts 
to the requirement that X2 — 2/2 belongs to a segment K Xl _ Vl , is preserved, thanks to the last 
condition above. Hence, the two transport plans have exactly the same cost. 

We are only left to find, for every xi, the map S*\. For this, it is enough to choose the 
monotone transport map from ^ i to v*\. This map is well defined because the first measure 
is absolutely continuous (in particular there are no atoms). The constraint is preserved because 
7^\ satisfied it, and the monotone transport map is optimal for any convex transport cost of 
x 2 - 2/2 • 

Notice that the whole map 5 n> j we are using is measurable since in every ambiguous case we 
chose the monotone transport map. □ 

4. Application: c(x - y) = h(\\x - y\\) in R 2 

Theorem 4.1. Let v be ■probability measures compactly supported in R 2 ; with ^ <C C 2 , let 
1 1 • 1 1 be an arbitrary norm of R 2 and h : M+ — ► R+ a strictly convex and increasing function. 
Denote by c the function c{z) = h(\\z\\), which is still a convex function. Then the transport 
problem 

min I c(x- y) j(dx, dy), 7 G 1%, v) \ 

admits at least a solution which is induced by a transport map T. 

Proof. The strategy to prove this theorem is almost the one described in Section 2, with the 
additional trick of bundling together some faces of the convex function h( \ \x — y\ | ) . The boundary 
of the unit ball of the norm 1 1 • 1 1 has a countable number of flat parts (they are countable since 
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each one has a positive length, otherwise it would not be a flat part, and we cannot have more 
than a countable quantity of disjoint positive length segments in the boundary). We call F%, 
with i = 1,2,... the closure of these flat parts and we then associate to each face Fi the cone 
Ki = {tz, t>0,ze Fi}. 

Consider an optimal transport plan 7 and a pair of Kantorovich potentials ((f), if)) (these objects 
exist since the cost is continuous). From the general theory of optimal transport and what we 
underlined in Section 2, it follows that if (x, y) 6 spt(7) and a; is a differentiability point of 4>, 
then \J<j)(x) € dc(x — y). This may be re- written as 

x-y e dc*(V(j)(x)). 

We classify the points x according to dc*(V(j)(x)). This subdifferential may be either a singleton 
or a segment (it cannot be a set of dimension two because otherwise c would be affine on a set 
with non-empty interior, which is in contradiction with the strict convexity of h). We call Xq 
the set of those x such that dc* (V 4>(x)) is a singleton. If instead it is a segment, it is a segment 
of points which share a single vector in the subdifferential of c. This means, thanks to the shape 
of c, that it is necessarily an homothety of one face F^. 

x - y £ dc*(\7(j)(x)) = t(x)F l(x) C K i{x) . 

We denote by Xi the set of points x where c*(V(j>(x)) is an homothety of F$. 

The disjoint decomposition given by the sets Xi induces a corresponding decomposition 7 = 
Si>o 7* wnere li — J\XixR 2 - Each sub-transport plan 7.; is an optimal transport plan between 
its marginals [ii and !/,-, which are submeasures of \x and v, respectively. In particular, each 
measure fii is absolutely continuous. 

We now build a solution to the optimal transport problem for the cost c that is induced by a 
transport map by modifying 7 in the following way: 70 is already induced by a transport map 
since for every x we only have one y; every 7, for i > 1 will be replaced by a new transport 
plan with the same marginals which does not increase the cost, thanks to Theorem 13. 31 Since 7^ 
sends m to Vi, is optimal for the cost c and satisfies the additional condition spt(7j) C {(x,y) : 
x — y 6 Ki}, it is also optimal for the cost c(x — y) + XKi(x — y). The advantage of this new 
cost is that c depends on one variable only, when restricted to Ki since there exists a new basis 
(e\,e2) such that ||z|| = z\ for every z £ Ki which is written as (z\,Z2) in this new basis (the 
direction of the vector e\ being the normal direction to Fi). Theorem [33] precisely states that the 
optimal transport problem for a cost which is given by a strictly convex function of one variable 
plus a constraint imposing that x — y belongs to a given convex set admits a solution induced 
by a transport map, provided the first measure is absolutely continuous and some 7, (f> and i\> 
satisfying equations 13.21 and 13.31 exist . This last assumption is exactly satisfied by taking 7, and 
the original potentials <f> and tp for the problem with cost c from (i to v, since they satisfied 

<f>(x)+i>(y) < h(\\x-y\\) < h(\\x-y\\)+ X K t (x-y), 
4>(x) +ip{y) = h(\\x -y\\) = h(\\x - y\\) + XKi(x ~ y) on spt^J. □ 
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